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Dual Regression 

By Richard Spady and Sami Stouli 
Summary 

We propose an alternative ('dual regression') to the quantile regression process for the global es- 
timation of conditional distribution functions under minimal assumptions. Dual regression pro- 
vides all the interpretational power of the quantile regression process while largely avoiding the 
need for 'rearrangement' to repair the intersecting conditional quantile surfaces that quantile re- 
gression often produces in practice. Dual regression can be appropriately modified to provide full 
structural distribution function estimates of the single equation instrumental variables model; this 
and similar extensions have implications for the analysis of identification in econometric models 
of endogeneity. 

1. Introduction 

Let y be a scalar random variable with continuous support and X a vector random variable with 
continuous or discrete support. Then the conditional distribution function of Y given X, written 
U = F{Y\X), has three properties: (1) U is standard uniform, (2) U is independent of X, and 
(3) F{Y = y\X = x) is strictly increasing in y for any value x of X. We will refer to these three 
properties as "uniformity", "independence" and "monotonicity". 

Supposing that we have a sample of n points {xj, yi} drawn from the joint distribution F{Y, X), 
how might we estimate the n values Ui = F(Y = yi\X = Xi) using only the requirement that 
the estimate displays uniformity, independence, and monotonicity? We explore this question by 
formulating a sequence of mathematical programming problems that embodies these require- 
ments and that generalizes the dual formulation of the quantile regression problem. Although 
this 'generalized dual'[^formulation seeks only to find the n values Ui = F{Y = yi\X = xi), its 
dual — ^the primal, so to speak — shows that the assignment of these n values follows a combina- 
tion of location-scale representations, the simplest element of which is a linear heteroscedastic 
model. Even this simplest representation, like the quantile regression process, provides a com- 
plete estimate of F{Y\X). Moreover, it is largely free of 'quantile-crossing' problems that the 
quantile regression process sometimes encounters in practice. 

Acknowledgement: We are grateful to Jelmer Ypma for his help with Ipoptr. Authors' affiliations: Richard Spady: Department 
of Economics, Johns Hopkins University and CeMMAP. Sami Stouli: Department of Economics, University College London and 
CeMMAP. This first version October 25, 2012. 

' The use of 'dual' is also motivated by the more general observation that the estimation problem for a statistical model with 
parameter 9 and stochastic element e is usually formulated in terms of a procedure that obtains 6 directly and £ as a byproduct 
that follows from a calculation from the representation evaluated at a specific value of 8. Here we turn that process around, 
obtaining e first (from e.g. a mathematical programming problem) and 'backing out' 6 afterwards, if at all. 
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Because the methods we propose stand as a sort of 'drop-in' replacement for quantile regression, 
the extensions of quantile regression methods to other problem areas have analogs in our meth- 
ods. In particular, we are able to provide estimates of the single equation instrumental variables 
model that are free of the complexities and infelicities found in current quantile regression based 
methods while retaining the same interpretation. 

In the following Section we outline our method and discuss its interpretation. Section 3 con- 
siders a simple example, the classic data of Engel (Koenker (20051) on household income and 
food expenditure. We compare our method to 'rearrangement' method for quantile regression 
proposed by [Chemozhukov et al. (2010 2009). Section 4 shows how the basic method can be 
used to estimate a full 'structural' conditional distribution function under the assumptions of the 
single equation instrumental variables model that is popular in econometrics. This is done with- 
out recourse to either rearrangement nor iterative methods, and considerably curtails the role 



of arbitrary judgements inherent to previous methods such as those found in Chemozhukov & 



Hansen ( 2005( 1 and |Chemozhukov & Hansen ( 2006[ ). A final section summarizes and speculates. 



In the interest of brevity we eschew technical details, particularly asymptotic distribution theory, 
and provide only a cursory consideration of the generality of the formulation. 



2. Basics 



The dual problem of the (linear) .5 quantile regression of y on X is (cf. Koenker (2005 1 p. 87, 
equation 3.12): 

max {y'u\X^{u - ^) = 0, u G [0, 1]"}, (1) 

where y is an (n x 1) vector of dependent variable values and X is an (n x k) matrix of ex- 
planatory variable values that includes an intercept. 

The solution to problem ([T]l produces values of u that are largely and 1 , with k sample points 
being assigned u values that are neither nor 1 . The points that assigned 1 fall above the median 
quantile regression; the points receiving O's fall below; and the remaining points fall on the 
median quantile regression plane. One direction of extension of equation (1) is to replace the 
"1/2" with values a that fall between and 1 to obtain the a quantile regression. 

Another extension is to augment problem (1) by adding k more constraints: 

max {y'u\ ( ^^JV '^IT \ ' ^ ^ t^' ^l'^}- 

Apparently the solution to ([T]) does not satisfy ([2]): the variance of u (around 0) in the solution to 
(1) is approximately ^, not |. To satisfy program (j2jl, the u's have to be moved off of {0},{ 1 }. 
Since X contains an intercept, the sample moments of u and will be ^ and ^ ; u and will be 
orthogonal to the components of X, relations that are necessary but not sufficient for uniformity 
and independence. 

Both systems ([T]l and (|2]) impose monotonicity by simply correlating y and u. It is worth not- 
ing that a violation of monotonicity requires there to be two observations that share the same 
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X values but have different y values, with the lower of the two y values having the (weakly) 
higher value of n. But a 'solution' characterized by such a violation could be improved upon by 
exchanging the u assignments. Hence the correlation criterion of systems (1) and (2) suffices to 
impose montonicity. However, system ([T]) is dual to a linear program well-known to have solu- 
tions at which k observations are interpolated when k parameters are being estimated - i.e the 
hyperplanes obtained by regression quantiles must interpolate k observations. 

The goal of assigning a u value to each observation such that uniformity, independence, and 
monotonicity is achieved could equally well be achieved by assigning a value e € M to each ob- 
servation, where e obeys the independence and monotonicity requirement, but where e is given 
by F-^{u) for some distribution function F. Such a e solution is transformed into a correspond- 
ing u solution by taking u = F{e); without loss of generality we can take F to correspond to 
a distribution with zero mean and unit variance. Doing this, the problem corresponding to (2) 
becomes: 

(X^e = 

where some simplification (particularly in computation) is obtained since e can take on any real 
value (whereas u is restricted to [0,1]). 

It is natural to take u = Fn{e), the empirical cdf of e, thereby imposing uniformity to high 
precision even at small n. 

2- 1. The Dual Problem 
The solution to the problem in equation|3]is easily found from the Lagrangian 

n n ^ n 

^ = ^Vidi - Xi^XiCi - -A2^Xi(ef - 1). (4) 

i=l i=l i=l 

Differentiating with respect to e,, we obtain n first-order conditions: 

—— = Vi - Xi ■ Xi - (A2 • Xi)ei = 0. (5) 

Keeping in mind that is a A; component vector (and thus so are the Lagrange multipliers Ai 
and A2) we obtain for each e^: 

Vi- Xi- Xi 

ei = — , (6) 

A2 • Xi 

which is of the familiar location-scale form: 

e. = (7) 

with the functions ij.{x) and a{x) being linear in x. 
Another view is obtained by writing 

yi = Xi-Xi + (A2 • Xi)ei, (8) 
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so that 



Ui = (Ai + A2ei) • Xi 

= {Xi + X2F-\ui)) ■ Xi 

= P{Ui) ■ Xi, 



(9) 



a standard quantile regression representation. 



2-2. Method of Moments Representation 

While the solution for e can be obtained directly by solving the mathematical program ([3]), 
knowlege that the solution obeys equation ([6]) can be exploited to write estimating equations 
for A in the form 



y- Xi-x 
A2 • X 



X 



y- Xi-x 
Ao • x 



1 




0. 



(10) 



The computation of the asymptotic distribution of A follows straightforwardly from this charac- 
terization. 



2-3. Generalization 
The two approaches in systems Q and Q share the common structure 

max y^ e s.t. hj(e) = 0, j = l,...,J, (11) 

e 

which gives rise to the first-order conditions 

J 

yi = '^{>^j ■ Xi)hj{ei) (12) 

j=0 

X^hj{e) = 0, (13) 

where e is a stochastic element and hj{e) is the antiderivative of hj{e). In ( [TT[ ) both the mono- 
tonicity element (the objective) and the independence element (the constraints) require further 
generalization, analysis, and specification. However, already ( 12 1 suggests a representation for 
Y\X. 

If we represent the stochastic structure of y|X as 

Y = H{X,e)=H^{e), (14) 

and define 

H^{e) = I H,{e)de, (15) 



then the monotonicity of Hx{e) in e guarantees the convexity of Hx{e). At each value x, Hx{e) 
is a simple convex function of scalar e; the slope of this function gives the value of Y correso- 
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ponding to e at X = x. Thus F{Y\X) corresponds to a collection of convex functions, with one 
element of this collection for each value of X, together with a single random variable e whose 
distribution is common to all the convex function^. 

Now suppose we are faced with an {X, Y) sample and we are tasked with assigning a value of 
e to each observation. What do we need to know about Hx(e) in order to to do this correctly 'in 
the limit'? 

Solving the (infeasible) optimization problem: 

max e s.t. Hj:.{ei) = S, (16) 
generates the correct 'y — e' assignment provided S is correctly chosen. Writing the Lagrangean 

n 

^ = y'^e-A{Y,H,,{ei)-S), (17) 

i=l 

the n associated first-order conditions are: 

yi-KH,^^{ei) = 0, 



dei 

so that A = 1 when S is set correctly. This demonstrates that maximizing y~^ e generally suffices 
to match e's to y's, regardless of the form of Hx{e). 



Problem (16 1 is infeasible because neither (e) nor S is known. However, each of the convex 
functions (e) can be represented as a linear combination of J convex basis functions, the 
coefficients of which depend on x 

HxM = ^^j{xi)h,{e) = ^j{xi) ■ hj{e). (18) 

3 

Since X is independent of e, E[l3j{x) ■ hj{e)] = E[/3j{x)] ■ E[hj{e)] = m{x); thus if both com- 
ponent expectations are known or if one is equal to zero, then m{x) is known. 

Though in applications it will turn out to be useful to follow a hybrid path in which E[hj (e)] = 
for some j but not others, let us impose this restriction for all j initially. 

If E[hj{e)] = for all j then 5 = 0. The Lagrangean for the 'y — e' assignment problem is 

n 

^ = y^e-A{Y^HxM)) 

i=l 
n J 

= 2/Te-A(^^/3,(x,)/i,(e)) (19) 

i=l j 
n 

i=l 

^ Given one random variable e with a particular distribution, we can always monotonically transform it to another random variable 
and similarly transform the functions of Hx(e) so as to leave F{Y\X) unchanged. 
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If we parameterize /3j(x) as f3j{x;6j) and add 9 to the choice variables of the optimization 
problem we obtain the dim(^) additional constraints 

Recalling A = 1, the special case /3j(x; 9j) = 6j ■ x yields the familiar orthogonality constraint 

n 

y^^Xj ■ hj{ei) = 



i=l 



X'^hj{e) = Q. (21) 



Equation ( 2 1 1 can be directly appended to the objective max y e to obtain an optimization 



problem in which the Lagrange multiplier (written above in ( 12 1 as A in view of its Lagrange 



interpretation) is 9 (which emphasizes its 'parametric' interpretation). 

The special case of 'simple dual regression' corresponds to /io(e) = e, /ii(e) = (e^ — l)/2, 
where the assumption that E[hj{e)\ = is basically a normalization. The simple basis {e, (e^ — 
l)/2} is obviously 'impoverished' for the space of all convex functions, although quite practical 
for many applications once the flexibility in the distribution of e is taken into account. 

Nonetheless, more general approaches suggest themselves. To keep the ensuing analysis simple 
we will make an assumption which is restrictive but both practical and easily relaxed. Without 
loss of generality let X be centered so E{X) = and with a decided loss of generality let 

= 7j + Aj • X, (22) 

so that fij{x) is linear and x is without an intercept. This permits us to separate the constant 
from X, which in turns allows to conveniently exploit the implication of independence that the 
correlation of x with functions of e is zero. 



Using ( p2) simple dual regression has n first-order conditions that can now be written: 

yj = 70 + 71 ej + (Ao • Xi) + (Ai • Xi)ei. (23) 

Extension of the basis using elements hj{e) results in these conditions becoming: 

2/i = 70 + 71 + (Ao • Xi) + (Ai • Xi)ei + ^(Aj • Xi)hj{ei) + ^7i^j(ei)- (24) 

i=2 j=2 

Now in general the mean of hj{e) will be nonzero, say mj. But had rrij been known and sub- 
tracted from hj{e), the resulting basis element would now have mean 0; consequently jj, as a 
Lagrange multiplier, would be zero. A simple calculation confirms that neither the form of hj (cj) 



apppearing in (24 1 nor the value of Xj would be affected by this manipulation. Thus (24 1 reduces 
to 

yi = 70 + 71 ei + (Ao • Xi) + (Ai • Xi)ei + ^(A^ • Xi)hj{ei), (25) 
when X is centered. 
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Equation (25 1 admits of the following interpretation. When x = 0, y = jq + jie and e= (y — 
7o)/7i, so that e is just a re-scaled version of the distribution of y at X = 0. Since e is indepen- 
dent of X, transformations of this 'shape' of e must suffice to produce Y at other values of X. 
The first two transformations — (Aq • x.i) and (Ai • Xi)ei — are translations of location and scale 
which do not essentially affect the 'shape' of Y's response to changes in e at all. The additional 
terms {Xj ■ Xi)hj{ei) achieve that end. 

2-4. Formal Duality 

Writing 

Y = H.,{e) (26) 
C{x,e) = J H,{e)de (27) 
Lx{y) = supy-e- C{x, e), (28) 

e 

we obtain at the final step the Legendre transform or convex conjugate of C(x, e). C{x, e) is 
itself a convex function that represents F{Y\X) once a distribution for e is given. Its Legendre 
transform therefore contains the same information. 

In the case of simple dual regression the sequence is 

y = (Ao • x) + (Ai • x)e (29) 

C{x.e) = (Ao • x)e + ^(Ai • x){e'^ - 1) (30) 

2-5. Discussion 

If we designate problems such as ([T]l and Q as (already) 'dual', then their solutions reveal a cor- 
responding 'primal'. Typically, the Lagrange multipliers of the dual appear as parameters in the 
primal, and the primal has an interpretation as a data generating process (DGP). So perhaps not 
surprisingly the constraints on the construction of the stochastic elements have 'shadow values' 
that are parameters of a data generating representation. In this way the relation between identi- 
fication and estimation is made perspicuous: a parameter of the DGP is the Lagrange multiplier 
of a specific constraint on the construction of the stochastic element, so to specify that some pa- 
rameters are non-zero and others are zero is to say that some constraints are (in the large-sample 
limit) binding and others are not. 

Another way of expressing this is to say that when a primal corresponds to the DGP, additional 
moment conditions are superfluous: they will (in the limit) attract Lagrange multiplier values of 
zero and consequently not affect the value of the program (the objective function) nor the so- 
lution. In a sense, this is obvious: the parameters of the primal can typically be identified and 
estimated through an Af -estimation problem that will generate k equations to be solved for the 
k unknown parameters. Nonetheless, the recognition that the only moment conditions that con- 
tribute to enforcing the independence requirement are those whose imposition simultaneously 
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reduces the objective function while providing multipliers that are coefficients in the stochastic 
representation of Y suggests the futility of portmanteau approaches (e.g. those based on char- 
acteristic functions) to imposing independence j^he dual formulation reveals that to specify the 
binding moment conditions is to specify a (approximating) DGP representation, which then can 
be extrapolated to provide estimates of objects of interest beyond the n explicitly estimated val- 
ues Ui = F{Y = yi\X = Xi) that characterize the sample and the definition of the mathematical 
program. 

A further generalization is obtained by regarding X as elementary regressors and defining 



W = W{X) as a vector formed by transformations of X, cf. Belloni et al. (2011 1 for a detailed 
treatment of this series formulation in the context of quantile regression. Except in the notation 
F{Y\X), this type of series or sieve analysis in the foregoing is achieved by simply substituting 
W and w for X and x throughout. The remainder of the discussion is unaffected. 

The simplest form of these fomulations, the linear heteroscedastic model of equation (|8]) has 



been previously encountered in the quantile regression literature: see Koenker & Zhao (1994l; 
|He| ( [1997"l ). The former considers the efficient estimation of ^ via L-estimation while the latter 
develops a restricted quantile regression method that prevents quantile crossing. 



3. Engel's Data Revisited 



In this section we illustrate how dual regression can be applied to estimation of both conditional 
quantile and distribution functions. We illustrate our method by revisiting the Engel curve ex- 



ample of Koenker (2005 1. In this example, dual regression delivers monotone and well-behaved 
estimates across the entire conditional quantile process. Dual regression finite-sample properties 
are also studied by means of several Monte Carlo simulations. We show that dual regression 
improves on standard linear quantile regression estimates and compares favorably with the es- 
timates obtained when rearrangement procedures Chemozhukov et al. (2009] 2010) are applied 
to original conditional quantile functions. Furthermore, dual regression is shown to have tighter 
confidence bands than standard quantile regression methods, therefore delivering well-behaved 
and more precise estimates of both conditional quantile and distribution functions in the example 
considered. 



3 • 1 . Empirical Example 

The classical dataset collected by Engel consists of food expenditure and income measurements 
for 235 households. The dataset has been studied in depth byjKoenker ( 2005| ) by means of quan- 
tile regression methods. Koenker (2005 1 shows that the dispersion of food expenditure increases 
with household income, so that a location-scale model is particularly well-suited to the study 
of this data. We thus estimate the statistical relationship between food expenditure and income, 
assuming that it obeys a well specified linear heteroscedastic model with household income as a 
single regressor and food expenditure as outcome of interest. 



^ In the discussion on instrumental variables below we will demonstrate that the Lagrange multipliers of the moinent constraints 
are precisely the coefficients of the stochastic representation. 
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Figure 1 : Dual regression estimate of the distribution of food expenditure conditional on house- 
hold income. Level sets of (solid lines) are plotted for a grid of values ranging from • 1 to • 9. 
The projected 'shadow' level sets yield the respective conditional quantile functions appearing 
on the xy-plane. 



All computational procedures are implemented in the software R ( |R-Development-Core-Team 



(20071). For dual regression we use Ipopt (Interior Point Optimizer), an open source software 



package for large-scale nonlinear optimization (Waechter & Biegler (2006 1), and its R interface 



( Ypma (201 1 1). This has proven to be an effective and easy-to-use solver for the dual regression 



constrained optimization problem given in ([3]l. Reliable and elegant implementation of quantile 



regression procedures in the package Quantreg ( Koenker ( 2007 1) have been used to carry our 
comparisons. 

Figure[T]illustrates our results and plots the estimated conditional distribution of food expenditure 
given household income. The figure summarizes all the information delivered by dual regression. 
First, the sequence of estimates {ui}^^i , where u = Fn{e), the empirical distribution function 
of e, is used in order to plot each observation i in the {x,y, n) -space with predicted coordinates 
(xj, yi,Ui). Second, the predicted conditional distribution function, parametrized by the empiri- 
cal distribution Fn ^ ^^^^'^ ^ , fully characterizes the statistical relationship between income and 
food expenditure. Last, the solid lines give the n-level sets for a grid of values {0 • 1, . . . , • 9}. 
Although nonstandard, this representation can be related to standard quantile regression plots 
since the levels of the distribution function give the conditional quantiles of food expenditure 
for each value of income. For instance, setting u to the value 0-5, the median value of food 
expenditure conditional on income is directly available from the projection of the corresponding 
level set on the xy plane. These are the plotted "shadow" solid lines corresponding for each u 
to the dual regression estimates of the conditional quantile functions of food expenditure given 
household income. 
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Figure 2: Engel coefficient plots revisited. Dual (solid) and quantile (dashes) regression estimates 
of the intercept (a) and income (b) coefficients as a function of the quantile index. Least squares 
estimates are also shown (dot-dash). 



It is apparent from Fig. [T] that the predicted conditional distribution function obtained by dual 
regression is indeed endowed with all desired properties. Of particular interest is the fact that 
the estimated function satisfies the requirement of being monotone in food expenditure. Also, 
our estimates satisfy some basic smoothness requirements across probability levels, in the food 
expenditure values. It is important to note that this feature does not typically characterize esti- 
mates of the conditional quantile process by quantile regression methods, as conditional quantile 
functions are then estimated sequentially and independently of each other. The decreasing slope 
of the distribution function across values of income provides evidence that the data indeed follow 
a heteroscedastic generating process. This is the distributional counterpart of quantile functions 
having increasing slope across probability levels, a feature characterizing the conditional quan- 
tile functions on the xy plane and signalling increasing dispersion in food expenditure across 
household income values. 

Figure |2] compares our estimates of the functional intercept and covariate coefficients, /3(ti) as 
introduced in ([9]), with estimates obtained by quantile regression. Estimates of functional quan- 
tile regression coefficients for Engel's data are given in Koenker (2005). For interpretational 
purposes, we follow Koenker (2005) and estimate the functional coefficients after having recen- 
tered household income. This avoids having to interpret the intercept as food expenditure for 
households with zero income. After centering, the functional intercept coefficient can be inter- 
preted as the n-th quantile of food expenditure for households with mean income: this is Tukey's 
"centercept". Fig. [2] shows the estimated quantile regression coefficients as a function of U. It il- 
lustrates the fact that dual regression estimates are indeed smoother than their quantile regression 
counterpart, the latter having a somewhat erratic behaviour around the dual regression estimates. 
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Figure 3: Scatterplots and dual (a), quantile (c) and rearranged quantile (e) regression estimates 
of the conditional • 1 to • 9 quantile functions (solid Unes) for Engel's data, and their rescaled 
counterparts ((b),(d),(f)). 
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Figure [3] gives the more familiar quantile regression plots. The plots presented show scatterplots 
of Engel's data as well as conditional quantile functions obtained by dual, quantile and rearranged 
quantile regression methods (Chernozhukov et al. 2010). The rescaled plots in the right panels of 
Fig. [3] highlight some features of the three procedures. The rescaled dual regression plot shows 
that fitted lines obtained from dual regression are not subject to crossing in this example, whereas 
several of the fitted conditional quantile lines obtained by quantile regression actually cross for 
small values of household income. The rearranged quantile regression plots show non cross- 
ing conditional quantile functions obtained by applying the rearrangement procedure to original 
quantile regression estimate^ However, the predicted rearranged conditional quantile functions 
in regions with few observations lead to regression functions that tend to exhibit a degree of 
curvature. This illustrates the fact that ex post monotonization does not constrain the rearranged 
estimates to be linear and is subject to judgement in the implementation, which are rather unap- 
pealing features that dual regression circumvents. Last, the more evenly spread dual regression 
conditional quantile functions reflect the fact that dual regression imposes a functional form to 
the functional coefficients. This is in contrast with the quantile regression methodology which is 
semiparametric by construction, and cannot incorporate information about the functional form 
of the quantile regression coefficients. This results in a loss of information since the parametric 
form of the coefficients is indeed known in the case of a linear heteroscedastic model. 



3-2. Simulations 

In this section we give results of several Monte Carlo simulations in order to assess finite-sample 
properties of dual regression. The model considered is a linear heteroscedatic model as in the 
previous section and the experiment is calibrated to the Engel data empirical application. We 
first compare dual regression estimates of a conditional distribution function to those obtained 
applying the rearrangement procedure of ([Chemozhukov et ar](|2009[|2010^) as a benchmark for 



dual regression estimates. The performance of dual regression in estimating conditional quantile 
functions is also studied and compared to linear quantile regression methods. 

The outcome Y is generated from a Gaussian location-scale model calibrated to Engel's data 
example 

Vi = All + A2lXi + (Ai2 + \22Xi)F^^{ui) 

= /3o{ui) + (3i{ui) ■ Xi, 

where the disturbance e is generated as A^(0, 1), and X is drawn from a left-truncated nor- 
mal with truncation at 277. For the vector of A parameter values we take estimates given 
by the method suggested in Koenker and Xiao (20020 and set Ai = (86 • 35, -21 • 39)', and 
A2 = (0 • 55, • 12) . The functional quantile regression parameters of P{u) = (/3o(u), /3i(u)) 



* Letting Qy|x("l^) the conditional u-th quantile function obtained by quantile regression, and for e = ■ 001, the 
first step of the rearrangement procedure is implemented by estimating the conditional distribution function F{y\x) = e + 

rl — e 

J l{Qy |jf (ii|x) < y}du, which is monotone in the level y, and, in a second step, obtaining the rearranged conditional 
quantile function F^^{u\x) = inf |j/ : F{y\x) > «| , which is monotone in the probability level u. 
^ For a grid of quantile indices {ui, . . . , uj}, {$()(uj), fii(uj)) are estimated by quantile regression, and Ai = (An , A12) 
and A2 = (A2i,A22) coefficients are set equal to the estimates obtained from linear regression of {l3o(uj), $i(uj)) on 
{(1, <I>~^ («j )) ; 1, . . . , J}, where <I>^^ is the inverse standard normal distribution. 
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Figure 4: Simulation results for intercept (a) and covariate (b) coefficients: median estimates 
across simulations and 90% confidence intervals. Solid lines are the dual regression estimates 
and dashed lines are the quantile regression estimates. The truth (dot-dash) is covered by the 
median estimates. 



are therefore given by Po{u) = An + Ai2^ ^(u) and Pi{u) = A21 + A22^* ^("u), and the con- 
ditional distribution function by u = $ ( j^^^x^ ^ • 

As in the previous section, dual regression estimates of the conditional CDF are u = Fn{e). As 
a benchmark for dual regression, the conditional distribution function is also estimated by re- 
arranged quantile regression and given by ifi^ = Jq 1{Pq^{u) + pf^{u) ■ x < y}du, where 
P^^iu) and {u) are the linear quantile regression estimate of /3(m). Dual regression's out- 
put delivers Lagrange multipliers yielding estimated coefficients Po{u) = An -|- X2iF~^{u) and 
Pi{u) = A12 + X22F-\u) of I3{u). 

Table 1 reports a first set of results of our Monte Carlo simulations regarding the accuracy of 
U estimates across simulations. It reports average estimation errors of dual regression and rear- 
ranged quantile regression, respectively, and their ratio in percentage terms. Average estima- 
tion errors are measured in norms || ||p, p = 1,2, and 00, where for a measurable func- 
tion Fe : M [0, 1], ll-Fellp = {/r \Fe{e)\^ de}^^^. For each simulation, the estimation errors 
\\u — $(e)||p and \\u'^^ — ^{e)\\^ are computed, and the errors are averaged across simulations 
for each sample size. The results show that dual regression estimates systematically outperform 
rearranged quantile regression estimates of the conditional distribution function. In these sim- 
ulations the spread in performance increases along with sample size. Whereas the reduction in 
average estimation error is between 7 and 22%, depending on the norm, for sample size n = 100, 
estimation error is reduced up to 35% when n = 1000. 
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Table 1: estimation errors xlOO and ratios of estimation errors of dual and rearranged 
quantile regression estimates of the conditional distribution function, for p = 1,2 and oo. 



Sample size 


ri 

^DR 




J^DR. 


Ldr/ Lqr 


r oo 
J^DR. 


r oc i T oo 
^DrI ^QR 


n = 100 


4- 11 


93-26 


5-60 


92 - 22 


14-80 


78-52 


n = 235 


2-68 


92 - 72 


3-68 


90-37 


11 - 16 


74-65 


n = 500 


1 • 84 


91 - 91 


2-53 


88-46 


8-52 


69- 11 


n = 1000 


1 • 31 


91 - 59 


1 - 81 


87-71 


6-55 


65-06 



and L^q^ are the average L'' errors of the dual and rearranged quantile regression estimates. 

Table 2: Summary results of the simulation study for the intercept coefficient: RMAE across 
quantile indices and sample sizes. 



Sample size Method 



n = 100 



RMAE 
Quantile index 



n = 235 



n = 500 



n = 1000 





T 


= .1 


T = 


.25 


r 


= .5 


r — 


.75 


T 


= .9 


DR 


4 


23 


3- 


78 


3 


55 


3- 


72 


4 


14 


QR 


4 


77 


4 - 


19 


3 


95 


4- 


15 


4 


74 


DR 


3 


40 


3- 


02 


2 


81 


2 - 


96 


3 


32 


QR 


3 


71 


3- 


30 


3 


15 


3- 


29 


3 


73 


DR 


2 


88 


2 - 


52 


2 


28 


2 - 


42 


2 


74 


QR 


3 


04 


2 - 


71 


2 


60 


2 - 


70 


3 


04 


DR 


2 


52 


2 - 


17 


1 


91 


2 - 


02 


2 


33 


QR 


2 


55 


2 - 


28 


2 


18 


2 - 


27 


2 


54 



RMAE, square root of mean absolute error across simulations; DR, dual regression; QR, quantile regression. 

Table 3: Summary results of the simulation study for the income coefficient: RMAE across quan- 
tile indices and sample sizes. 



Sample size Method 



100 



235 



n = 500 



n = 1000 



RMAE 
Quantile index 





T 


= .1 


T = 


.25 


r 


= .5 


r — 


.75 


T 


= .9 


DR 





16 


0- 


15 





14 


0- 


15 





16 


QR 





18 


0- 


16 





15 


0- 


16 





18 


DR 





13 


0- 


12 





11 


0- 


12 





13 


QR 





14 


0- 


13 





12 


0- 


13 





14 


DR 





11 


0- 


10 





09 


0- 


10 





11 


QR 





12 


0- 


10 





10 


0- 


10 





12 


DR 





09 


0- 


08 





08 


0- 


08 





09 


QR 





10 


0- 


09 





08 


0- 


09 





10 



RMAE, square root of mean absolute error across simulations; DR, dual regression; QR, quantile regression. 



Figure |4] illustrates the results of our simulations with n = 235, the number of observations in 
Engel's data. For each method, the solid line is the median estimate of functional intercept, /3o(n), 
and covariate, /3i(n), coefficients across simulations. The 90% confidence bands are constructed 
pointwise by taking the .5 and .95 quantile estimate across simulations. For both coefficients, a 
striking feature is that dual regression bands follow the median estimates uniformly over the en- 
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tire quantile process, whereas quantile regression confidence bands tend to get wider at extreme 
values of the probability index u. 

Tables 2 and 3 summarize the results of the simulations for all sample sizes. For each functional 
coefficient, we compute the root mean absolute error of our estimates, obtained by either method, 
by computing errors for quantile indices in {0 • 1, • 25, • 5, • 75, • 9} for each replication 
and then computing the summary statistic. In all cases dual regression yields estimates with lower 
RMAE, which corroborates results shown in Fig. |4] 

4. Estimation of the Single Equation Instrumental Variables Model 

One approach to doing 'structural', 'ceteris paribus', or 'causal' modeling is estimation of the 
single equation instrumental variables model 



where H*{x, e) is strictly increasing in e, and e is not assumed to be independent of X anymore 
but is independent of a sufficiently effective instrument Z. Here in generality X and Z are both 
vectors with the possibility that some variables appear in both, but in the example below there is 
no overlap and both variables are scalar. Since H* is strictly increasing in its second argument, 
its inverse is well defined and relates e to y at each value of X: 



The analogy with the construction of the statistical distribution F{Y\X) is made apparent by 
considering the strictly monotone transformation 



where the "star" notation in F*{Y\X) is to indicate that e is not independent of X and we are 
here dealing with a 'structural' distribution function rather than a statistical distribution function. 

As in the cases considered above, it is convenient not to work with the marginally uniform 
random variable generated by F*{Y\X) but with e which has support the entire real line, since 
Y is continuously distributed. Thus e will be independent of Z with distribution F{e), and at 
a fixed value of X, Y will be a strictly monontonic function of e. (This function depending, of 
course, on the value of X). 



A first approach to the problem simply replicates the steps of the formulation of generalized 
dual regression (cf. Section 2.3) for the statistical distribution function F{Y\X) and inserting a 
modification appropriate to the definition of F*(Y\X). We thus start from the structural repre- 
sentation 



Y = H*{X,e) 



e = H*-^{X,Y). 



F{e) = F{H*-^{X,Y)) = F*{Y\X) 



(32) 



4- 1. First Approach: Indirect Instrumentation 



Y = Hl{e) 




16 

where H*{e) is the structural function and e is not independent of X. The construction of 
C*{x, e) occurs in parallel to the previous development, and as in that case we assume there 
exists the expansion 

C*(x,e) = 5^/3;(x)/I,(e) j = l,...,J. 

j 

If we proceed as before, the key elements being the centering of x, the writing of the functions 
(3j{x) as linear in x 

(3*{x)=^* + X*-x, 
and the treatment of (7*, Ap as parameters, we arrive at the system 

Hi = lo + {Xq- Xi) + {XI- Xi)ei + '^{X* ■ Xi)hj{ei) i = l,...,n (33) 

i=2 

Xjhjie) = Sj j = 0,...,J, (34) 

where the notation Xj indicates that X includes an intercept for j = 0,1 but not otherwise. Each 
parameter cum Lagrange multiplier in (7* , A* ) in equation ( 33 1 has a corresponding constraint 
in equation (34i. In the 'nonstructural' case Sj = but here the value of sj is unknown, being 
determined by the nature of the dependency between X and e. Obviously, if the correct value of 
Sj were known the correct values of the Lagrange multipliers would follow. 



However it is not necessary to know sj to obtain the values of the Lagrange multipliers. One 
approach is to write: 

X = E{X\Z) + {X -E{X\Z)), 
and, for j = 0, J, to rewrite equation ([34|) as 



[E{X\Z) + {X- E{X\Z))]J h,{e) = 
E{X\Z)hj{e) + {X - E{X\Z))hj{e) = Sj 
{) + {X - E{X\Z))hj{e) = Sj. 



Thus upon substituting for Sj in (34 1, for i = 1, n and j = 0, J, the system 

yi = lo+ Ti^i + (Aq • Xi) + (A* • Xi)ei + ^(A* • Xi)hj{ei) (35) 

i=2 

E{X\Z)]hj{e) = 0, (36) 



has the same solution as the system ( [33p4| ) since the Lagrange multipliers of the constraints ( [36) 
are the same as those of (|34l). 



Classical two stage least squares is obtained from (35 36 1 by setting A* = for j > 0, which is 
achieved by removing the corresponding elements of the orthogonality constraint equation (36 1, 
and by constructing E{X\Z) by linear regression. 
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4-2. A Second Approach: Direct Instrument Orthogonality 

The approach of the preceding section exploits the Z independence condition by rendering a 
statistic of X conditional on Z orthogonal to the basis functions of the stochastic element e. A 
more direct approach would be to render Z itself orthogonal to functions of e while providing 
a direct representation of Y in terms of X and e. In some ways this would be a more literal 
interpretation of the intuitive directive to maximize e subject to Z independent of e and Y 
monotonic in e at each value of X. 

To fix ideas, we will give e a parametric representation by writing: 

e = e{y,x,/3). (37) 

Proceeding parametrically permits easy enforcement of the monotonicity relation (of e and Y) 
and the exclusion restriction (of Z from the structural function Y = H*{X, e)). 

Parallel to our previous formulations, our optimization problem becomes 

r / / M iz^e{y,x,f3) = 
max {ye{y,x,l3)\i , (38) 

/3 [^Z' {e{y,x,/3y -1) = 

with corresponding Lagrangean: 



n n 1 

^ = ^ VieiVi, Xi, /3) - Ai ^ Zie{yi, Xi, /3) - -A2 ^ Zi{e{yi, Xi,l3f - 1) 



(39) 



i=l i=l 

So the FOC's are: 



^ da dei ^ n //im 



Proceeding observation by observation, this can be written: 



y\\y^-\x■z,- (A2 • z,)e,\ 1^ = 0. (41) 



The term \yi — \\- zi — (A2 • Zi)ej\ in equation (41 1 represents the marginal benefit of increasing 
ej, with the contribution yi arising from the objective function and the two A terms representing 
the marginal costs of satisfying the two orthogonality constraints. 



Generalization of the formulation in (38 1 and the interpretive equation (41 1 to a full set of basis 
functions for e is straightforward. 

If we impose the functional structure for our original optimization problem of equation ([3]), the 
specific form of e(?/, x, /3) is given by: 

e{y,x,f3) = — . (42) 

P2 ■ X 



If this expression is substituted into equation (41 1, and if Z and X coincide, as they do in the 
exogenous case of equation (|3]) and Section 2, some calculation reveals that A = /3, as might be 
expected: The computations associated with equation ^ demonstrate that if e{y, x, (3) is freely 
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chosen (i.e. without a functional form restriction), equation (41 1 results with the coefficients /? 
taking the values A. So if we impose this form a priori the solution is unchanged. 

Of course, when Z and X are different, it is an assumption that e{y,x, /3) takes the form given in 



equation (42 1, but to analyze the case that would obtain were there to be no endogeneity seems a 
natural starting point. 

Let k = dim{X) and m = dim{Z). Then there are 2k elements of /3 to be determined in order 



to define equation (j4T]l, and 2m constraints involving Z in the optimization problem ( [38] ); the 
latter have 2m corresponding multipliers A. 



When k = m , the solution of problem ( 38 1 for /3 is determined solely by the constraints and 
thus coincides with the method of moments representation 



1 



zTe(y,x,/3) = 
Z'^(e(y,x,/3)2-l) = 0, 



(43) 



cf. the exogenous case ( 10 1. That is, these 2m equations alone are sufficient to determine the 
2k unknowns /3, since m = k. Adding equation (41 1 to this system in this 'just-identified' case 



provides a solution for A: there are then 2k equations of type (41 1 and 2k equations of type (43 1 
that determine the 2k values of A and the 2k values of /3. As in the exogenous case, the joint 
asymptotic distribution of /3, A follows from this representation. 

When the dimensionality of Z and X differ — say m = dim{Z) and k = dim{X) — then there 
are 2m equations of type ([43]) in 2k unknowns, and 2k equations of type ([4T]) in 2m unknowns. 



5. Conclusion 

When the problem of estimating F{Y\X) is conceived as assigning values of the stochastic ele- 
ments subject to the constraint of independence from X and monotonicity in Y, series of nonlin- 
ear programs results. The simplest member of this series is an alternative to, and in some ways 
a generalization of, the linear program that results from the dual characterization of quantile re- 
gression. Hence simple dual regression serves many of the same purposes as quantile regression. 

These same principles can be applied to the characterization of the single equation instrumental 
variables model that is popular in econometrics. When this is done, a complete characterization 
of the structural distribution function F*{Y\X) results. 

As is well understood in mathematical programming, dual solutions provide lower bounds on the 
values obtained by primal problems. In the generic form of the problems we have considered here 
there is no gap between the primal and dual values; hence in econometrics these problems are 
said to display "point identification". We conjecture that the problems without point identification 
do have gaps between their dual and primal values, and that this characterization will enhance 
our understanding. 
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